LOESS regression with

The other day, I came a small problem: I was investigating a dataset, and the different variables clearly showed a non-linear behaviour.
Consequently, I could not apply the classical linear regression.
An approach to solve this kind of problem is LOESS regression, which stands for locally weighted scatterplot smoothing .

In the following, we will work through an artificial example to make the concept clear and show how to apply in R.

We first create a dataframe with two variables, a and y, which are non-linearily related. The plot shows this relationship.

We perform two regressions, one linear and one loess.



## 
## Call:
## lm(formula = y ~ a)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.303 -0.635  0.294  0.792  1.635 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.100335   0.136374    22.7   <2e-16 ***
## a           0.007645   0.000305    25.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.973 on 98 degrees of freedom
## Multiple R-squared: 0.865,   Adjusted R-squared: 0.864 
## F-statistic:  630 on 1 and 98 DF,  p-value: <2e-16


## Call:
## loess(formula = y ~ a)
## 
## Number of Observations: 100 
## Equivalent Number of Parameters: 4.56 
## Residual Standard Error: 0.587 
## Trace of smoother matrix: 4.98 
## 
## Control settings:
##   normalize:  TRUE 
##   span       :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2



The scatterplot clearly shows the better fit from the loess regression.
The summary of the regression shows one difference between linear and loess: The loess regression does not give the parameters, since it is a local regression. The classical measure of goodness of fit is r2.
The linear model has a r2 of 0.865 . For the LOESS we have to calculate the r2 and follow this proposal.



## [1] 0.9528



The r 2 from the loess is 0.953 and thus very good and better than the r 2 from the linear regression.
Also the investigation of the plot of residuals vs fitted/predicted values indicates a much better fit of the LOSS regression compared to the linear regression (the residuals plot of the linear regression shows the structure – which we clearly do not want); the QQ-plot shows some issues on the tails.

But what to do now with it? Finally, we probably want to predict some values out of this exercise.
The predict -function helps us with this exercise.



##     a linreg loess
## 1  10  3.177 2.265
## 2 400  6.158 7.133
## 3 900  9.981 9.365



Alright, this was a quick introduction on how to apply the LOESS regression.

And here is the full code:

## @knitr create_data
y <- seq(from=1, to=10, length.out=100)
a <- y^3 +y^2  + rnorm(100,mean=0, sd=30)
data <- data.frame(a=a, y=y)
plot(y=y, x=a)

## @knitr linreg
linreg <- lm(y~a)

summary(linreg)
loess <- loess(y~a)
summary(loess)
scatter.smooth(data)
abline(linreg, col="blue")

## @knitr loess_fit
hat <- predict(loess)
plot(y~a)
lines(a[order(a)], hat[order(hat)], col="red")
(r_sq_loess <- cor(y, hat)^2)

## @knitr fit_loess
par(mfrow=c(2,2))
plot(linreg$fitted, linreg$residuals, main="classical linear regression")
plot(loess$fitted, loess$residuals, main="LOESS")
# normal probablility plot
qqnorm(linreg$residuals, ylim=c(-2,2)) 
qqline(linreg$residuals)
qqnorm(loess$residuals, ylim=c(-2,2)) 
qqline(loess$residuals)

## @knitr predict

predict <- data.frame(a=c(10,400,900))
scatter.smooth(data)
abline(linreg, col="blue")
predict$linreg <- predict(linreg, predict)
predict$loess <- predict(loess, predict)
predictpoints(x=predict$a, y=predict(linreg, predict), col="blue", pch=18, cex=2)
points(x=predict$a, y=predict(loess, predict), col="red", pch=18, cex=2)

Recent Posts